from sympy import *
We derive an approximate pdf for the invariant mass distribution from combinatorial background. Combinatorial background refers to random pairs of unrelated particles that are combined to compute the invariant mass of the hypothetical ancestor particle.
It is known empirically that an exponential distribution often provides a good description of this background. We show here why that is. We work in coordinates where \(c=1\).
We express the 3-momentum of a particle in \(p_T, \eta, \phi\) since the probability distributions approximately factorize:
\[ g(p_T) \, h_1(\phi) \, h_2(\eta) \]
where \(h_1(\phi)\) is a uniform distribution from \(-\pi, \pi\) and \(h_2(\eta)\) is a somewhat bell-shaped pdf around 0 with a broadened center and with a dip at zero. We will see that the result does not depend on \(h_2(\eta)\) within the limits of the approximation.
The distribution \(g(p_T)\) is approximately exponential for \(p_T \gg 1\,\text{GeV}c^{-1}\). We will assume it is exponential, which is the first approximation.
We now express the invariant mass \(m\) of the hypothetical ancestor particle in these coordinates.
def ancestor_mass(pt1, eta1, phi1, pt2, eta2, phi2, m1, m2):
# see e.g. Wikipedia https://en.wikipedia.org/wiki/Pseudorapidity
= cos(phi1) * pt1
px1 = sin(phi1) * pt1
py1 = sinh(eta1) * pt1
pz1
= cos(phi2) * pt2
px2 = sin(phi2) * pt2
py2 = sinh(eta2) * pt2
pz2
# momentum squared of ancestor
= (px1 + px2) ** 2 + (py1 + py2) ** 2 + (pz1 + pz2) ** 2
p_sq
# momentum squared of particles 1 and 2
= px1 ** 2 + py1 ** 2 + pz1 ** 2
p1_sq = px2 ** 2 + py2 ** 2 + pz2 ** 2
p2_sq
# energies of particles 1 and 2
= sqrt(p1_sq + m1 ** 2)
e1 = sqrt(p2_sq + m2 ** 2)
e2
return sqrt((e1 + e2) ** 2 - p_sq)
We use SymPy to transform and simplify the formula for the ancestor mass. First we declare the variables.
= symbols("p_T_1 p_T_2 m_1 m_2 lambda m", positive=True, real=True)
pt1, pt2, m1, m2, lambd, m
= symbols("eta_1 eta_2 phi_1 phi_2", real=True) eta1, eta2, phi1, phi2
ancestor_mass(pt1, eta1, phi1, pt2, eta2, phi2, m1, m2)
\(\displaystyle \sqrt{- \left(p_{T 1} \sin{\left(\phi_{1} \right)} + p_{T 2} \sin{\left(\phi_{2} \right)}\right)^{2} - \left(p_{T 1} \cos{\left(\phi_{1} \right)} + p_{T 2} \cos{\left(\phi_{2} \right)}\right)^{2} - \left(p_{T 1} \sinh{\left(\eta_{1} \right)} + p_{T 2} \sinh{\left(\eta_{2} \right)}\right)^{2} + \left(\sqrt{m_{1}^{2} + p_{T 1}^{2} \sin^{2}{\left(\phi_{1} \right)} + p_{T 1}^{2} \cos^{2}{\left(\phi_{1} \right)} + p_{T 1}^{2} \sinh^{2}{\left(\eta_{1} \right)}} + \sqrt{m_{2}^{2} + p_{T 2}^{2} \sin^{2}{\left(\phi_{2} \right)} + p_{T 2}^{2} \cos^{2}{\left(\phi_{2} \right)} + p_{T 2}^{2} \sinh^{2}{\left(\eta_{2} \right)}}\right)^{2}}\)
The raw expression for the ancestor mass is rather complex. We ask SymPy to simplify it.
= ancestor_mass(pt1, eta1, phi1, pt2, eta2, phi2, m1, m2).simplify(); m_expr m_expr
\(\displaystyle \sqrt{m_{1}^{2} + m_{2}^{2} - 2 p_{T 1} p_{T 2} \cos{\left(\phi_{1} - \phi_{2} \right)} - 2 p_{T 1} p_{T 2} \sinh{\left(\eta_{1} \right)} \sinh{\left(\eta_{2} \right)} + 2 \sqrt{m_{1}^{2} + p_{T 1}^{2} \cosh^{2}{\left(\eta_{1} \right)}} \sqrt{m_{2}^{2} + p_{T 2}^{2} \cosh^{2}{\left(\eta_{2} \right)}}}\)
This looks much better.
We now introduce the second approximation. We assume that \(m_i \ll p_{T,i}\), which plays along nicely with the first approximation \(p_{T,i} \gg 1\,\text{GeV}c^{-1}\) if the outgoing particles are light. In other words, this approximation can be expected to work well for decays into particles with masses \(m_i \ll 1\,\text{GeV}c^{-2}\).
We expand the expression to first order in \(m_{i}\) in a Taylor series. The first order happens to be zero, so we simply get:
= m_expr.subs(m1, 0).subs(m2, 0).simplify(); Eq(m, m_approx_expr) m_approx_expr
\(\displaystyle m = \sqrt{2} \sqrt{p_{T 1}} \sqrt{p_{T 2}} \sqrt{- \cos{\left(\phi_{1} - \phi_{2} \right)} + \cosh{\left(\eta_{1} - \eta_{2} \right)}}\)
To compute the pdf of the ancestor mass \(m\), we need to perform a change of variable and then integrate over the remaining coordinates.
We first make the change of variables.
The old variables are \(p_{T,1}, p_{T,2}, \eta_1, \eta_2, \phi_1, \phi_2\).
The new variables are \(m, p_{T,2}, \eta_1, \eta_2, \phi_1, \phi_2\).
The pdf in the new variables is
\[ f'(m, p_{T,2}, \eta_1, \eta_2, \phi_1, \phi_2) = g(p_{T,1}) \, g(p_{T,2}) \, h_1(\phi_1) \, h_1(\phi_2) \, h_2(\eta_1) \, h_2(\eta_2) \, \left|\frac{\partial(m, p_{T,2}, \eta_1, \eta_2, \phi_1, \phi_2)}{\partial(p_{T,1}, p_{T,2}, \eta_1, \eta_2, \phi_1, \phi_2)} \right|^{-1} \\ f'(m, p_{T,2}, \eta_1, \eta_2, \phi_1, \phi_2) = g(p_{T,1}(m, p_{T,2}, \dots)) \, g(p_{T,2}) \, h_1(\phi_1) \, h_1(\phi_2) \, h_2(\eta_1) \, h_2(\eta_2) \, \left| \frac{\partial m}{\partial p_{T, 1}} \right|^{-1} \]
We manually express \(\sqrt{p_{T, 1}}\) as a function of \(m\), to replace it later. The SymPy solver has trouble doing this, as it tends to break down the \(\cosh\) into exponentials, but we want to keep it intact.
= m / m_approx_expr * sqrt(pt1); Eq(sqrt(pt1), sqrt_pt1) sqrt_pt1
\(\displaystyle \sqrt{p_{T 1}} = \frac{\sqrt{2} m}{2 \sqrt{p_{T 2}} \sqrt{- \cos{\left(\phi_{1} - \phi_{2} \right)} + \cosh{\left(\eta_{1} - \eta_{2} \right)}}}\)
= sqrt_pt1 ** 2; Eq(pt1, pt1_m) pt1_m
\(\displaystyle p_{T 1} = \frac{m^{2}}{2 p_{T 2} \left(- \cos{\left(\phi_{1} - \phi_{2} \right)} + \cosh{\left(\eta_{1} - \eta_{2} \right)}\right)}\)
We compute \(\left| \frac{\partial m}{\partial p_{T, 1}} \right|^{-1}\) and express it as a function of \(m\).
= 1/diff(m_approx_expr, pt1); dpt1_dm dpt1_dm
\(\displaystyle \frac{\sqrt{2} \sqrt{p_{T 1}}}{\sqrt{p_{T 2}} \sqrt{- \cos{\left(\phi_{1} - \phi_{2} \right)} + \cosh{\left(\eta_{1} - \eta_{2} \right)}}}\)
= dpt1_dm.subs(sqrt(pt1), sqrt_pt1); dpt1_dm dpt1_dm
\(\displaystyle \frac{m}{p_{T 2} \left(- \cos{\left(\phi_{1} - \phi_{2} \right)} + \cosh{\left(\eta_{1} - \eta_{2} \right)}\right)}\)
# exponential pdf
def g(pt):
return lambd * exp(-lambd * pt)
# uniform pdf from -pi, pi
def h1(phi):
return 1/(2 * pi)
# unspecified pdf for eta
= Function("h2") h2
= simplify(dpt1_dm * g(pt1_m) * g(pt2) * h1(phi1) * h1(phi2) * h2(eta1) * h2(eta2)); f_prime f_prime
\(\displaystyle - \frac{\lambda^{2} m \operatorname{h_{2}}{\left(\eta_{1} \right)} \operatorname{h_{2}}{\left(\eta_{2} \right)} e^{\lambda \left(\frac{m^{2}}{2 p_{T 2} \left(\cos{\left(\phi_{1} - \phi_{2} \right)} - \cosh{\left(\eta_{1} - \eta_{2} \right)}\right)} - p_{T 2}\right)}}{4 \pi^{2} p_{T 2} \left(\cos{\left(\phi_{1} - \phi_{2} \right)} - \cosh{\left(\eta_{1} - \eta_{2} \right)}\right)}\)
To obtain \(f(m)\), we need to integrate \(f'\) over \(p_{T,2}, \phi_1, \phi_2, \eta_1, \eta_2\).
Integral(f_prime, 0, oo),
(pt2, "eta_a", "eta_b"), (eta2, "eta_a", "eta_b"),
(eta1, -pi, pi), (phi2, -pi, pi)) (phi1,
\(\displaystyle \int\limits_{- \pi}^{\pi}\int\limits_{- \pi}^{\pi}\int\limits_{\eta_{a}}^{\eta_{b}}\int\limits_{\eta_{a}}^{\eta_{b}}\int\limits_{0}^{\infty} \left(- \frac{\lambda^{2} m \operatorname{h_{2}}{\left(\eta_{1} \right)} \operatorname{h_{2}}{\left(\eta_{2} \right)} e^{\lambda \left(\frac{m^{2}}{2 p_{T 2} \left(\cos{\left(\phi_{1} - \phi_{2} \right)} - \cosh{\left(\eta_{1} - \eta_{2} \right)}\right)} - p_{T 2}\right)}}{4 \pi^{2} p_{T 2} \left(\cos{\left(\phi_{1} - \phi_{2} \right)} - \cosh{\left(\eta_{1} - \eta_{2} \right)}\right)}\right)\, dp_{T 2}\, d\eta_{1}\, d\eta_{2}\, d\phi_{1}\, d\phi_{2}\)
This integral does not look analytically tractable. Since it contains an exponential function, Laplace’s method comes to mind as an approximation. To apply it, the function \(g(\vec x)\) in the exponential \(e^{M g(\vec x)}\) must have a global maximum inside the integration bounds. We investigate \(g(\vec x)\) numerically to see whether that is the case.
def g(x, y, z):
return 1/(x * (np.cos(z) - np.cosh(y) + 1e-100)) - x
import numpy as np
import matplotlib.pyplot as plt
= np.linspace(0.05, 3, 100)
xn = np.linspace(2, 5)
yn = np.linspace(-2*np.pi, 2* np.pi)
zn
= plt.subplots(1, 3, figsize=(14, 4))
fig, ax 0])
plt.sca(ax[= np.meshgrid(yn, zn)
Y, Z 0.5, Y, Z), shading="nearest")
plt.pcolormesh(Y, Z, g(
plt.colorbar()1])
plt.sca(ax[= np.meshgrid(xn, yn)
X, Y 0), shading="nearest")
plt.pcolormesh(X, Y, g(X, Y,
plt.colorbar()2])
plt.sca(ax[for yi in yn[::10]:
0)) plt.plot(xn, g(xn, yi,
There is indeed a global maximum as a function of \(p_{T,2}\) between 0 and infinity, but not with respect to \(\Delta\eta\). The maximum for \(\Delta\eta\) is located at the upper bound of the \(\eta\) interval, which means that Laplace’s method cannot be applied to the integral as a whole.
We cannot solve the integral completely, so we do the next-best thing. We partially solve it by approximately carrying out the integration over \(p_{T,2}\) for each value of \(C(\Delta\eta, \Delta\phi) = \cosh\Delta\eta - \cos\Delta\phi\) with Laplace’s method:
\[ \int h(x) e^{M g(x)} \text{d}x \approx \sqrt{\frac{2\pi}{M |g''(x_0)|}} h(x_0) e^{M g(x_0)} \]
with \(M > 0, M \rightarrow \infty, h(x) > 0\) and \(x_0\) being the maximum of \(g(x)\).
To obtain simpler formulas, we make a bunch of substitutions:
- \(x := p_{T,2}\)
- \(\Delta \eta := \eta_1 - \eta_2\)
- \(\Delta \phi := \phi_1 - \phi_2\)
- \(C(\Delta\eta, \Delta\phi) := \cosh \Delta\eta - \cos\Delta\phi\)
= symbols(r"\Delta\eta \Delta\phi")
deta, dphi
= symbols("x", real=True, positive=True)
x = Function("C", real=True, positive=True)(deta, dphi)
C
= f_prime.subs((cosh(eta1 - eta2)-cos(phi1 - phi2)), C).subs(eta1, eta2 + deta).subs(phi1, phi2 + dphi).subs(pt2, x).simplify(); f_prime f_prime
\(\displaystyle \frac{\lambda^{2} m \operatorname{h_{2}}{\left(\eta_{2} \right)} \operatorname{h_{2}}{\left(\Delta\eta + \eta_{2} \right)} e^{- \lambda \left(\frac{m^{2}}{2 x C{\left(\Delta\eta,\Delta\phi \right)}} + x\right)}}{4 \pi^{2} x C{\left(\Delta\eta,\Delta\phi \right)}}\)
= lambd M
= -m ** 2 / (2 * x * C) - x; g g
\(\displaystyle - \frac{m^{2}}{2 x C{\left(\Delta\eta,\Delta\phi \right)}} - x\)
= f_prime / exp(M * g).simplify(); h h
\(\displaystyle \frac{\lambda^{2} m \operatorname{h_{2}}{\left(\eta_{2} \right)} \operatorname{h_{2}}{\left(\Delta\eta + \eta_{2} \right)}}{4 \pi^{2} x C{\left(\Delta\eta,\Delta\phi \right)}}\)
assert simplify(h * exp(M * g)) == f_prime
= diff(g, x); gx gx
\(\displaystyle \frac{m^{2}}{2 x^{2} C{\left(\Delta\eta,\Delta\phi \right)}} - 1\)
= diff(g, x, 2); gxx gxx
\(\displaystyle - \frac{m^{2}}{x^{3} C{\left(\Delta\eta,\Delta\phi \right)}}\)
= solve(Eq(gx, 0), x)[0]; x0 # there is only one solution, since x > 0 x0
\(\displaystyle \frac{\sqrt{2} m}{2 \sqrt{C{\left(\Delta\eta,\Delta\phi \right)}}}\)
= gxx.subs(x, x0); gxx0 gxx0
\(\displaystyle - \frac{2 \sqrt{2} \sqrt{C{\left(\Delta\eta,\Delta\phi \right)}}}{m}\)
= h.subs(x, x0); h0 h0
\(\displaystyle \frac{\sqrt{2} \lambda^{2} \operatorname{h_{2}}{\left(\eta_{2} \right)} \operatorname{h_{2}}{\left(\Delta\eta + \eta_{2} \right)}}{4 \pi^{2} \sqrt{C{\left(\Delta\eta,\Delta\phi \right)}}}\)
= g.subs(x, x0); g0 g0
\(\displaystyle - \frac{\sqrt{2} m}{\sqrt{C{\left(\Delta\eta,\Delta\phi \right)}}}\)
Now we have everything to get Laplace’s approximate solution.
2 * pi / (M * -gxx0)) * h0 * exp(M * g0).simplify() sqrt(
\(\displaystyle \frac{\sqrt[4]{2} \lambda^{\frac{3}{2}} \sqrt{m} \operatorname{h_{2}}{\left(\eta_{2} \right)} \operatorname{h_{2}}{\left(\Delta\eta + \eta_{2} \right)} e^{- \frac{\sqrt{2} \lambda m}{\sqrt{C{\left(\Delta\eta,\Delta\phi \right)}}}}}{4 \pi^{\frac{3}{2}} C^{\frac{3}{4}}{\left(\Delta\eta,\Delta\phi \right)}}\)
In other words, the approximate solution for each value of \(C(\Delta\eta, \Delta\phi)\) is
\[ f(m, C) \propto m^{1/2} \, e^{- m \, \lambda_C} \]
where \(\lambda_C\) is proportional to \(C^{-1/2}\). Since the function has a broad maximum, the actual function \(f(m)\) will be a superposition of many \(f(m, C)\) with similar slopes.
import numpy as np
import matplotlib.pyplot as plt
= np.linspace(2, 5)
yn 1/np.sqrt(np.cosh(yn) - 1), label=r"$C^{-1/2}(\eta, 0)$")
plt.plot(yn,
plt.legend()r"$\eta$"); plt.xlabel(
Simulation study
We confirm the analytical results in a toy simulation. We first generate some toy data. The \(p_T\) distribution is exponential. The distribution in \(\eta\) and \(\phi\) are uniform. We then form all unique pairs and compute the invariant mass for the case with candidate masses assumed to be zero.
import numpy as np
= np.random.default_rng(1)
rng
= rng.exponential(size=1000)
pt = rng.uniform(2, 5, size=1000)
eta = rng.uniform(-np.pi, np.pi, size=1000)
phi
= lambdify((pt1, eta1, phi1, pt2, eta2, phi2), m_approx_expr, "math")
f
# this is slow in pure Python, but we don't care for a one-shot
= []
m for i in range(1000):
for j in range(i):
m.append(f(pt[i], eta[i], phi[i], pt[j], eta[j], phi[j]))= np.array(m) m
We make a histogram of the resulting distribution.
= np.histogram(m, bins=50)
n, xe
from pyik.mplext import plot_hist
plot_hist(xe, n)"m")
plt.xlabel("counts")
plt.ylabel(; plt.semilogy()
It looks approximately like an exponential, but there are differences. The peak at zero is cut off and the slope varies, being steeper near zero and flatter in the tail. In a narrow interval, however, the exponential distribution is expected to give a good description.
We fit a simple exponential model and a more complex model that takes our analytical solution in account. It is a superposition of three distributions of the form \(A_i \sqrt{m} e^{-B_i m}\) with approximate the integral that we cannot compute analytically.
from iminuit import Minuit
from iminuit.cost import LeastSquares
def model_A(x, a, b):
return a * np.exp(- x * b)
def model_B(x, par):
= len(par) // 2
npar = np.asarray(par)[:, None]
par return x ** 0.5 * np.sum(par[:npar] * np.exp(-x * par[npar:]), axis=0)
= (xe[1:] + xe[:-1]) / 2
cx = (n + 0.5) ** 0.5 nerr
= Minuit(LeastSquares(cx, n, nerr, model_A), 1, 1)
min1 = (0, None)
min1.limits min1.migrad()
FCN = 7712 | Nfcn = 177 | |||
EDM = 4.59e-07 (Goal: 0.0002) | ||||
Valid Minimum | Valid Parameters | No Parameters at limit | ||
Below EDM threshold (goal x 10) | Below call limit | |||
Covariance | Hesse ok | Accurate | Pos. def. | Not forced |
Name | Value | Hesse Error | Minos Error- | Minos Error+ | Limit- | Limit+ | Fixed | |
0 | a | 153.21e3 | 0.30e3 | 0 | ||||
1 | b | 691.0e-3 | 0.9e-3 | 0 |
a | b | |
a | 8.84e+04 | 0.182 (0.678) |
b | 0.182 (0.678) | 8.14e-07 |
= Minuit(LeastSquares(cx, n, nerr, model_B), np.ones(6))
min2 = (0, None)
min2.limits =10000) min2.migrad(ncall
FCN = 149 | Nfcn = 1583 | |||
EDM = 8.62e-06 (Goal: 0.0002) | ||||
Valid Minimum | Valid Parameters | No Parameters at limit | ||
Below EDM threshold (goal x 10) | Below call limit | |||
Covariance | Hesse ok | Accurate | Pos. def. | Not forced |
Name | Value | Hesse Error | Minos Error- | Minos Error+ | Limit- | Limit+ | Fixed | |
0 | x0 | 3.2e3 | 1.5e3 | 0 | ||||
1 | x1 | 197e3 | 6e3 | 0 | ||||
2 | x2 | 123e3 | 5e3 | 0 | ||||
3 | x3 | 0.500 | 0.033 | 0 | ||||
4 | x4 | 1.744 | 0.033 | 0 | ||||
5 | x5 | 0.864 | 0.017 | 0 |
x0 | x1 | x2 | x3 | x4 | x5 | |
x0 | 2.35e+06 | -6.8e+06 (-0.769) | 4.87e+06 (0.607) | 49.3 (0.982) | 29.4 (0.583) | 23.5 (0.900) |
x1 | -6.8e+06 (-0.769) | 3.32e+07 | -2.85e+07 (-0.946) | -128 (-0.680) | -163 (-0.859) | -93.8 (-0.956) |
x2 | 4.87e+06 (0.607) | -2.85e+07 (-0.946) | 2.74e+07 | 86.3 (0.503) | 165 (0.958) | 78.7 (0.884) |
x3 | 49.3 (0.982) | -128 (-0.680) | 86.3 (0.503) | 0.00107 | 0.000538 (0.499) | 0.00046 (0.825) |
x4 | 29.4 (0.583) | -163 (-0.859) | 165 (0.958) | 0.000538 (0.499) | 0.00108 | 0.000464 (0.830) |
x5 | 23.5 (0.900) | -93.8 (-0.956) | 78.7 (0.884) | 0.00046 (0.825) | 0.000464 (0.830) | 0.00029 |
= plt.subplots(1, 2, figsize=(10, 4), constrained_layout=True)
fig, ax 0])
plt.sca(ax[
plot_hist(xe, n)= np.linspace(xe[0], xe[-1], 1000)
xm *min1.values), label=f"exponential: $\\chi^2/n_{{dof}} = {min1.fval:.1f}/{len(cx) - min1.nfit}$")
plt.plot(xm, model_A(xm, =f"superposition: $\\chi^2/n_{{dof}} = {min2.fval:.1f}/{len(cx) - min2.nfit}$")
plt.plot(xm, model_B(xm, min2.values), label
plt.semilogy()
plt.legend()"m")
plt.xlabel(1])
plt.sca(ax[
plot_hist(xe, n)*min1.values))
plt.plot(xm, model_A(xm,
plt.plot(xm, model_B(xm, min2.values))"m") plt.xlabel(
Text(0.5, 0, 'm')
The sophisticated superposition model globally describes the distribution much better than the plain exponential model. The cut-off near zero is modeled by the term \(\sqrt{m}\), which suppresses the rise of the pure exponentials near zero. The superposition of three exponentials with different slopes is able to match the changing slope.
Even for the sophisticated model, \(\chi^2\) does not look so good, but that is because we use the approximate least-squares method here to keep it simple, in which we do not integrate the model over each bin. The implied approximation is bad near \(m = 0\), where the distribution has a sharp peak.
For an approximation over a local interval not too close to \(m = 0\), a simple exponential distribution will also perform well, since locally one of the exponentail distributions is dominant.